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TMA 04 Cut-off date 16 May 2018 


Questions 1 to 4 below, on Book 4 Bayesian statistics, form tutor-marked 
assignment M249 04. Question 1 (on Part I of Book 4) is marked out of 24, 
Question 2 (on Part II) is marked out of 25, Question 3 (on Part III) is 
marked out of 26, and Question 4 (on Part IV) is marked out of 25. 


Question 1 — 24 marks 


This question is intended to assess your use of relative frequencies and 
Bayes’ theorem, and your use and interpretation of prior and posterior 
distributions for the analysis of proportions. You should be able to answer 
this question after you have studied Part I of Book 4. 


In this question you will be required to supply a plot from LearnBayes for 
part (e)(i) only. This plot should be included in the body of your work at the 
relevant point, and you should include only what is relevant to the question 
and your answer. 


(a) At a clinic for patients with suspected malaria, 89 out of a total of 300 
were found to actually be infected with malaria. 


Use this information to estimate P(I = infected), where J represents the 


malaria infection status (infected or not infected) of a randomly selected 
patient attending this clinic. 


(b) There are various blood tests that can be used to diagnose malaria. One 
particular test is ‘paracheck’, which is a rapid dipstick test. Let 


R = paracheck positive for malaria. 


For patients at this clinic, it is estimated that 

P(R | I = infected) = 0.910 and P(R | I = not infected) = 0.137. Use 
your answer to part (a) and Bayes’ theorem to estimate 

P(I = infected | R), the probability that when paracheck is positive for 
malaria, a patient at this clinic is infected with malaria. 


(c) An alternative diagnosis method is blood microscopy. Let 
M = microscopy positive for malaria. 


For patients at this clinic, it is estimated that 

P(M | R, I = infected) = 0.519 and 

P(M | R, I = not infected) = 0.207. Calculate the probability that a 
patient at this clinic who tested positive for malaria by both paracheck 
and microscopy is infected with malaria. 
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(d) Figure 3 shows four possible priors for the probability 0 that a patient 
at this clinic is infected with malaria. 
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Figure 3 Four priors 


(i) What is the most likely value of 6 suggested by prior D? What is 
the smallest value of 0 that is likely given prior D? And what is the 
largest? (You need give these values only to one decimal place.) 


(ii) Order the four priors from strong to weak. 


(e) Suppose now that before the malaria status of the 300 patients at this 
clinic had been determined, the most likely value for 0 was 0.4, and that 
0 was unlikely to be less than 0.3 or greater than 0.6. Use the data 
given in part (a) to answer the following. 


(i) Use LearnBayes to produce a plot of the prior, likelihood and 
posterior. Include a copy of this plot in your answer. 


(ii) What is the most likely value of 6 based on the posterior density? 
State which posterior summary this is based on, as well as the 
value itself. 


(iii) Compare the value of @ given in part (e)(ii) with the value of 0 
estimated in part (a). How similar are they? Explain carefully how 
the plot that you produced in part (e)(i) can be used to estimate 
these two values of 9. Does the plot produced in part (e)(i) lead to 
the same conclusion about these two estimates? Why or why not? 


(iv) Which is stronger, the prior or the likelihood? Justify your answer. 
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Question 2 — 25 marks 


This question is intended to assess your use and understanding of standard 
priors, conjugate analyses and Bayesian inference, including credible 
intervals. You should be able to answer this question after working through 
Part II of Book 4. 


In this question you will be required to supply a plot from LearnBayes for 
part (c)(i) only. This plot should be included in the body of your work at the 
relevant point, and you should include only what is relevant to the question 
and your answer. 


Every year in the run-up to Christmas, many people in the UK speculate 
about whether there will be a ‘White Christmas’; that is, snow on Christmas 
Day. 


There are many definitions of what exactly constitutes an official ‘White 
Christmas’. For the purposes of this question, assume that ‘White Christmas’ 
simply means snow or sleet falling in Glasgow sometime on Christmas Day. 


(a) Suppose that 6 represents P(next Christmas is a White Christmas). 
What is your assessment of the most likely value for 6? Also, what are 
your assessments for the upper and lower quartiles of 0? Briefly describe 
the reasoning that you used to make your assessments. [3] 


(b) Suppose that another student, Chris, assesses the most likely value of 0 
to be 0.25, the lower quartile to be 0.20 and the upper quartile to 
be 0.40. It is decided to represent Chris’s prior beliefs by a Beta(a, b) 
distribution. Use LearnBayes to answer the following. 


(i) Give the parameters of the Beta(a, b) distribution that best 
matches Chris’s assessments. [2] 


(ii) Is the best matching Beta(a, b) distribution that you specified in 
part (b)(i) a good representation of Chris’s prior beliefs? Why or 
why not? [4] 


(c) In the years 1918 to 2009, a period of 92 years, there were 11 Christmas 
Days in Glasgow that were officially ‘white’. (Assume that the 
probability of a White Christmas is independent of the weather 
conditions for any other Christmas Day. Also assume that there has 
been no change in climate and hence that the probability of a White 
Christmas has not changed during this period.) 


(i) Produce a plot of Chris’s prior for 0 along with the likelihood and 
posterior. Compare the posterior with Chris’s prior. How have 
Chris’s beliefs about the probability of a White Christmas changed 


in the light of these data? [7] 
(ii) Give a 99% highest posterior density credible interval for 6. Why is 
this interval not the same as the 99% equal-tailed credible interval? [2] 


(iii) The posterior for @ is a beta distribution. Why? Calculate the 
parameters of the beta distribution. (Note that you will have to do 
this by hand as these parameters are not given by LearnBayes.) [3] 
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(d) For each of the following, which of the standard models for a conjugate 
analysis is most likely to be appropriate? 


(i) 


Estimation of the proportion of UK households that entertain 
guests at home next Christmas Day. 


Estimation of the number of couples in Glasgow who become 
engaged next Christmas Day. 

Estimation of the minimum outside temperature in Glasgow (in 
degrees Celsius) next Christmas Day. 


Estimation of the proportion of UK households where at least one 
meal next Christmas Day contains turkey. 


Question 3 — 26 marks 


This question is intended to assess your understanding of stochastic 
simulation and its application using WinBUGS. You should be able to 
answer this question after working through Part III of Book 4. 


In this question you will be required to supply plots from WinBUGS for 
parts (a)(i), (a)(ii) and (b)(ii) only. These plots should be included in the 
body of your work at the relevant point, and you should include only what is 
relevant to the question and your answer. 


(a) In 1882, Simon Newcomb set up an experiment to measure the speed of 
light by measuring the amount of time required for light to travel 
7442 metres. The data recorded were deviations from 
24 800 nanoseconds. The 66 measurements made by Newcomb had 
sample mean 26.21 and sample standard deviation 10.75. It is 
reasonable to model these data by a normal distribution with mean yu 
and precision T. In this case, the conjugate joint prior for (~,7) is the 
normal-gamma prior, with resulting normal posterior for u|r and 
gamma posterior for 7. If, for simplicity, improper uniform priors are 
used for u and 7, then these will also result in a normal posterior for |r 
and a gamma posterior for 7. Using the data from the experiment 
produces the following posteriors: 


1 
data ~ N | 26.21, — 
LIT, ata, ( PSI 


T\data ~ Gamma(32.5, 3756). 


(You do not need to know how the posterior parameter values were 
calculated.) 


(i) 


(ii) 


(iii) 


In WinBUGS, open the file newcomb.odc. Check and compile the 
model. Get WinBUGS to generate all necessary initial values, and 
specify that both mu and tau should be monitored. Generate 2000 
sample values from the joint posterior for and 7. Obtain a 
scatterplot of the simulated values of u and 7, and include a copy 
of this plot in your answer. Describe the relationship between the 
two parameters. 

Summarize the marginal posterior distribution for u by means of a 
kernel density estimate, and include a copy of this plot in your 
answer. Report estimates of the marginal posterior mean and the 
equal-tailed 95% credible interval for u. 


Has the simulation been run for long enough to produce an 
accurate estimate of u? Explain your answer. 
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(iv) Based on the current accepted speed of light, the ‘true’ value for u 
in Newcomb’s experiment is 33.02. What does this suggest about 
Newcomb’s data? 


(b) In England, A levels are one of the types of qualifications offered by 
schools and colleges for students aged 16-19 years. Typically, students 
choose three subjects to study at A level. There were, however, concerns 
that there is a gender imbalance for some A level subjects. Cambridge 
Assessment (an exam group that designs and delivers assessments) 
undertook a questionnaire survey in 2006 to investigate A level subject 
choices made by students in England. They surveyed 945 male A level 
students and 1478 female A level students from a variety of ethnic 
backgrounds and school types (namely comprehensive, grammar, 
independent, sixth form colleges, tertiary colleges and further education 
colleges). The proportions of student gender and school type were 
representative of the A level population as a whole. 


The survey found that 340 of the male students chose maths and 187 
chose chemistry, while 288 of the female students chose maths and 270 
chose chemistry. Define the following parameters: 


Om = proportion of males choosing A level maths, 
0 = proportion of females choosing A level maths, 
wm = proportion of males choosing A level chemistry, 


Wr = proportion of females choosing A level chemistry. 


(i) For simplicity, suppose that independent noninformative uniform 
priors U(0,1) are defined for all four parameters, 0m, Or, YM 
and wr. To which beta prior is U(0,1) equivalent? Hence show 
that the posterior distributions for the four parameters are 


m | data ~ Beta(341, 606), 

Op | data ~ Beta(289, 1191), 
wm | data ~ Beta(188, 759), 
Wr | data ~ Beta(271, 1209). 


(ii) In WinBUGS, open the file Alevels.odc. Check and compile the 
model. Get WinBUGS to generate all necessary initial values, and 
specify that diff_maths, diff_chemistry, diff_male and 
diff female should all be monitored. Generate 5000 sample 
values from the joint posterior distribution. Show estimates of the 
marginal posterior distributions for diff_maths and 
diff_chemistry by means of two kernel density plots. Include 
copies of the kernel density plots in your answer. What do you 
conclude from these kernel density plots regarding gender 
imbalance in A level subject choices for maths and chemistry? 


(iii) Report estimates of the equal-tailed 95% credible intervals for 
diff_male and diff_female. Briefly explain what these credible 
intervals suggest about the proportions of males and females 
choosing maths and chemistry. 
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Question 4 — 25 marks 


This question is intended to assess your understanding of Markov chain 
Monte Carlo (MCMC) and its application using WinBUGS. You should be 
able to answer this question after working through Part IV of Book 4. 


In this question you will be required to supply output from WinBUGS for 
part (d) only. These plots should be included in the body of your work at the 
relevant point, and you should include only what is relevant to the question 
and your answer. 


Monitoring feedwater flow loss is important in the risk assessment of nuclear 
plants. Table 8 gives the number of times that feedwater flow was lost in 
each of 10 nuclear generating systems, and the time that each system was in 
operation. 


Table 8 Loss of feedwater flow 





Generating system Number of feedwater flow losses Time in operation 








i Ti ti 
1 4 15 
2 40 12 
3 0 8 
4 10 8 
5 14 6 
6 31 5 
7 2 5 
8 4 4 
9 13 4 
10 4 3 
For each 7 = 1,...,10, x; denotes the number of times that feedwater flow 


was lost in generating system 7 over a period of t; years. Each x; is an 
observation of the random variable X;, which is modelled by a Poisson 
distribution, so that for i = 1,..., 10, 


X; ~ Poisson(,;), 


where A; is the expected number of flow interruptions for generating system i 
over a period of t; years. 


(a) For each i = 1,...,10, let 0; be the number of feedwater flow losses per 
year for system i. A prior distribution was defined by putting 
A; = Oit; 


and letting all the 6; have the same prior distribution, 
6; ~ Gamma(a, 6). 


(i) Explain why it is reasonable to assume that all the 6; have the 
same prior, but it may not be reasonable to assume that all the à; 
have the same prior. [3] 


(ii) Give one reason for preferring to give 6; a gamma distribution 
rather than a normal distribution. [1] 
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The parameters a and £ specifying the prior for 0; are also unknown, so 
priors are also required for these parameters. Little expert information 
was available for either of the parameters a and 8, so noninformative 
prior distributions were defined for both parameters. 


Open the WinBUGS file feedwater.odc. Write down the lines of code 
that specify the priors for œ and 8. Which uniform prior distribution is 
specified for a? Why can it be considered noninformative? 


The flow interruption data for the 10 systems are contained in the 
WinBUGS file feedwater data.odc. Check the model, load the data, 
compile the model, and generate initial values for three chains. Specify 
that the nodes alpha, beta and lambda should be monitored, and 
generate three chains of simulated values, each of 2000 iterations. 


Do you think that WinBUGS used a special MCMC adapting algorithm 
to do the sampling? Justify your answer. 


With reference to suitable trace plots, discuss what number of iterations 
seems suitable for the burn-in phase. It is necessary to include trace 
plots only for alpha and beta in your answer, though your discussion 
should also mention those for lambda. 


Whatever you concluded in part (d), use a burn-in of 500 iterations for 

this part. Report estimates of the posterior means, standard deviations 

and equal-tailed 95% credible intervals for Az and Ag. Is there evidence 

that generating system 2 is more prone to feedwater interruptions than 

generating system 1, or could the difference between them be attributed 
to random variation? 
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